Contextual analysis in word-for-word MT
نویسنده
چکیده
EXPERIMENTS with word-for-word MT of Russian scientific literature have given results which, except for such limited purposes as indexing, are far from satisfactory. The difficulty is not so much one of word order as of syntactic and semantic ambiguity of individual words. Regardless of the treatment of the problem of inflected forms, for example, it is impossible in the majority of instances to identify the grammatical case of Russian nouns. In addition to syntactic ambiguity, multiple equivalents must be assigned to a large percentage of words (to an estimated 45% of the running words in a physics text). The chief disadvantage of word-for-word MT, then, is its prolixity: the reader is confronted with a burdensome multiplicity of potential equivalents (syntactic and semantic) for several words in each sentence. The chief cause of this ambiguity is the fact that each word is examined in isolation, as a discrete item. The human translator operates with the tremendous advantage of something called "context". Broadly speaking, context signifies environment: surrounding words, sentences, and even the subject area itself. Investigation shows that restricted contextual analysis, performed routinely, can resolve most of the problems of ambiguity. Remarkable clarification is attained even when the comparison of a given ambiguous word x is limited to the immediately contiguous word in the sentence (the pre-x or post-x word). Without attempting to rearrange the word order of the Russian sentence, one can obtain the following by comparison of each ambiguous word with the coded grammatical features or semantic class of contiguous words: a) Syntactic clarification. The ambiguity of case forms in nouns can be reduced to an insignificant percentage, and proper English equivalents can be supplied in the form of English prepositions as demanded by the genitive, dative, and instrumental cases. Such prepositions can be withheld in translation when the requirements of Russian grammar demand it. Participles and adverbs which are indistinguishable in form from adjectives, can, be given the correct equivalent; the comparative degree of adjectives and adverbs can be adequately handled. In general, there are no serious problems of syntax which cannot be resolved by reference to the grammatical features of preor post-words.
منابع مشابه
Contextual Bitext-Derived Paraphrases in Automatic MT Evaluation
In this paper we present a novel method for deriving paraphrases during automatic MT evaluation using only the source and reference texts, which are necessary for the evaluation, and word and phrase alignment software. Using target language paraphrases produced through word and phrase alignment a number of alternative reference sentences are constructed automatically for each candidate translat...
متن کاملNovel Document Level Features for Statistical Machine Translation
In this paper, we introduce document level features that capture necessary information to help MT system perform better word sense disambiguation in the translation process. We describe enhancements to a Maximum Entropy based translation model, utilizing long distance contextual features identified from the span of entire document and from both source and target sides, to improve the likelihood...
متن کاملContextual Probability and Word Frequency as Determinants of Pauses and Errors in Spontaneous Speech
This study investigated the relationship between the contextual probability of lexical items in spontaneous speech, as measured by the Cloze procedure, and word frequency. It also attempted to determine the relative importance of the two variables in causing delay, in the form of hesitation, in the production of spontaneous speech. The analysis revealed that content words of low contextual prob...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Mechanical Translation
دوره 3 شماره
صفحات -
تاریخ انتشار 1956